[pull] main from containerd:main#56
Open
pull[bot] wants to merge 5911 commits into
Open
Conversation
…40891b96e8 build(deps): bump the otel group with 6 updates
Add transfer types for container filesystem copy
The default walking applier performs a real temporary mount for unpacking, but the mount manager failed to adapt to the walking differ. This fixes the EROFS snapshotter together with the default walking differ, otherwise it reports: ``` ctr: apply layer error for "[]": failed to extract layer sha256:[]: failed to mount /var/lib/containerd/tmpmounts/containerd-mount3992073457: internal mount option "X-containerd.mkfs.fs=ext4" was not consumed by the mount manager ``` Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
diff/walking: enable mount manager
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
Support reading readonly overlays without mounting
…b.com/Microsoft/hcsshim-0.15.0-rc.1 build(deps): bump github.com/Microsoft/hcsshim from 0.14.0-rc.1 to 0.15.0-rc.1
The shim "start" helper returns the named pipe address before the daemon process has created the pipe via winio.ListenPipe(). On busy Windows systems, containerd may try to connect before the pipe exists. Add awaitPipeReady() — the start helper now polls the pipe address (up to 5s, 10ms intervals) before writing the bootstrap result to stdout. This follows hcsshim's readiness pattern where the shim verifies its endpoint is ready before signaling the parent. As a safety net, also parameterize makeConnection() with a dialer so binary.Start() uses AnonDialer (retry) for new shims while loadShim() keeps AnonReconnectDialer (fail-fast) for reconnects per #3659. On Unix, awaitPipeReady() is a no-op: domain sockets appear atomically. Signed-off-by: Esteban Ginez <esteban.ginez@docker.com>
- Use time.NewTimer + Stop() instead of time.After to avoid timer leaks - Treat context.DeadlineExceeded as retryable (pipe busy, not just missing) - Wrap last dial error instead of os.ErrNotExist for better diagnostics - Update makeConnection godoc to reflect current BootstrapResult type Signed-off-by: Esteban Ginez <esteban.ginez@docker.com>
Temporarily disable uploading logs to GCP for windows periodic tests until GCP credentials are renewed
Document shim bootstrap behavior
…Windows. Signed-off-by: Apurv Barve <apurvbarve@microsoft.com>
…dows fix(windows): verify pipe readiness before returning shim address
Signed-off-by: HirazawaUi <695097494plus@gmail.com>
Uses the definition of valid grammar for this field from the OCI image annotations spec: https://github.com/opencontainers/image-spec/blob/e72ae99d5fc74e7f7f8e320a44f76968da86a545/annotations.md#pre-defined-annotation-keys On this commit the test will fail per the bug #10681 `manifest annotation org.opencontainers.image.ref.name ="@sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c" does not match required grammar` Signed-off-by: Laura Lorenz <lauralorenz@google.com>
Make utils.sh nounset-safe by never expanding unset CGROUP_DRIVER on Windows
Bump cri-api to v0.36.0-rc.0
Avoid using logrus concepts in the API, use slog style log levels with integer values and 0 meaning the default "info" level. Signed-off-by: Derek McGowan <derek@mcg.dev>
Update bootstrap API log level definition
Signed-off-by: Derek McGowan <derek@mcg.dev>
Signed-off-by: Derek McGowan <derek@mcg.dev>
Prepare v2.3.0 beta.1 release
… with .exe suffix Signed-off-by: Apurv Barve <apurvbarve@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: apurv15 <69455689+apurv15@users.noreply.github.com>
Includes: "WCOW: restore support for client-mounted roots", which fixes a nil dereference in createWindowsContainerDocument when starting container with process isolation. full diff: microsoft/hcsshim@v0.14.0-rc.1...v0.15.0-rc.1 Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
…ocker/login-action-4.1.0 build(deps): bump docker/login-action from 4.0.0 to 4.1.0
For Exec format error on Windows, compile cri-integration.test binary with .exe suffix
update runhcs to v0.15.0-rc.1
core/remotes/docker: use SystemCertPool on Windows
Although EROFS has native compression support (and each filesystem can contain multiple compression algorithms), in many cases, people only consider using zstd compression when transporting on the wire in order to reduce the pulling time but maintain the optimal runtime performance. Only `+zstd` is considered: it has skippable frames which will be used for the seekable EROFS implementation in future containerd versions. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Vagrantfile: update DNF cache
Best practice for interpolation of input variables for bash scripts in Github workflows is to preresolve them into strings during an env step to avoid script injection; see docs at https://docs.github.com/en/actions/reference/security/secure-use#use-an-intermediate-environment-variable Signed-off-by: lauralorenz <lauralorenz@google.com>
The RunPodSandbox unconditionally pre-pulls the pause container image via ensurePauseImageExists() before starting any sandbox. However, only the "podsandbox" controller actually uses the pause image to create a pause container holding namespaces. Shim-based sandbox controllers (e.g. Kata Containers) manage the sandbox lifecycle entirely at the shim level and never reference the pause image. Add a DisablePauseImagePull flag to the Runtime config that gates ensurePauseImageExists(). When a sandboxer is not "podsandbox", the flag skips the unnecessary pre-pull, avoiding wasted network/storage overhead and reducing sandbox startup latency. The long-term direction is to offload image pulling entirely to the controller implementation (shim level); this flag is an incremental step toward that goal without introducing a breaking behavior change. Also add unit tests to verify that ensurePauseImageExists is only invoked for the "podsandbox" sandboxer and correctly skipped otherwise. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Jordan Liggitt <liggitt@google.com>
cri: skip pause image pull for shim sandboxer
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.81.0 to 1.81.1. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.81.0...v1.81.1) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-version: 1.81.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
…e.golang.org/grpc-1.81.1 build(deps): bump google.golang.org/grpc from 1.81.0 to 1.81.1
Update typeurl/v2 to v2.3.0 to drop gogo dependency
Bumps the otel group with 5 updates in the / directory: | Package | From | To | | --- | --- | --- | | [go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc](https://github.com/open-telemetry/opentelemetry-go-contrib) | `0.68.0` | `0.69.0` | | [go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib) | `0.68.0` | `0.69.0` | | [go.opentelemetry.io/otel/exporters/otlp/otlptrace](https://github.com/open-telemetry/opentelemetry-go) | `1.43.0` | `1.44.0` | | [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go) | `1.43.0` | `1.44.0` | | [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp](https://github.com/open-telemetry/opentelemetry-go) | `1.43.0` | `1.44.0` | Updates `go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc` from 0.68.0 to 0.69.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.68.0...zpages/v0.69.0) Updates `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` from 0.68.0 to 0.69.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.68.0...zpages/v0.69.0) Updates `go.opentelemetry.io/otel` from 1.43.0 to 1.44.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.43.0...v1.44.0) Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace` from 1.43.0 to 1.44.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.43.0...v1.44.0) Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` from 1.43.0 to 1.44.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.43.0...v1.44.0) Updates `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp` from 1.43.0 to 1.44.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.43.0...v1.44.0) Updates `go.opentelemetry.io/otel/sdk` from 1.43.0 to 1.44.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.43.0...v1.44.0) Updates `go.opentelemetry.io/otel/trace` from 1.43.0 to 1.44.0 - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.43.0...v1.44.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc dependency-version: 0.69.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel - dependency-name: go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp dependency-version: 0.69.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel - dependency-name: go.opentelemetry.io/otel dependency-version: 1.44.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel - dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace dependency-version: 1.44.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel - dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc dependency-version: 1.44.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel - dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp dependency-version: 1.44.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.44.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel - dependency-name: go.opentelemetry.io/otel/trace dependency-version: 1.44.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: otel ... Signed-off-by: dependabot[bot] <support@github.com>
The task service guards its containers map with s.mu, and getContainer()
takes it on behalf of effectively every task RPC (State, Connect, Stats,
Wait, Pause, Kill, ...). Create() held s.mu for its whole duration,
including runc.NewContainer(), which runs the actual `runc create`.
`runc create` can be slow on a loaded host. While it runs, any concurrent
task RPC blocks on s.mu. The tasks service applies a 2s timeout to State
(io.containerd.timeout.task.state), so a concurrent State waits on s.mu,
exceeds the deadline, and the ttrpc call is abandoned -- the late shim
reply then shows up as:
ttrpc: received message on inactive stream stream=3
Since deadline errors are now surfaced to clients, this is treated as a
fatal failure and the just-created container is torn down right after
start (observed on Lima/vz: nginx -> Exited (1)).
Move runc.NewContainer() out of the s.mu critical section, mirroring the
runtime v1 shim lock optimization. s.mu is taken only once the container
exists, to guard the map and the remaining (fast) setup, so a slow create
no longer blocks concurrent State and other lookups.
preStart/handleStarted/cleanup only use s.lifecycleMu, so early-exit
handling is unchanged.
See lima-vm/lima#5030.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
…304265291e build(deps): bump the otel group across 1 directory with 8 updates
Between starting the sandbox and adding it to the sandbox store, there are opportunities for failures including in any NRI RunPodSandbox prehooks. This defer is added to that period so if they fail, this function will try to clean it up itself. If the sandbox is already added to the persistent store, it will not attempt to stop the sandbox as it can now be recognized by other components from the CRI store. ShutdownSandbox is used instead of StopSandbox as it both stops it and cleans up all its directories. Signed-off-by: lauralorenz <lauralorenz@google.com>
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Resurrect 2.1 branch for a short period
Update the Fuzzing workflow to upload crash artifacts found during the go_test_fuzz job. Currently, when `go test -fuzz` fails, the crash reproducers are generated but not preserved, making it difficult to diagnose and fix the issues discovered in CI. This change adds an upload-artifact step that captures all files in testdata/fuzz directories across the repository upon failure. Assisted-by: gemini-cli Signed-off-by: Samuel Karp <samuelkarp@google.com> Signed-off-by: lauralorenz <lauralorenz@google.com>
…nectShim integration: deflake TestFailFastWhenConnectShim
runc-shim: don't hold the service lock across runc create
…-reset cri: reset pull progress timer on idle→active transition
The CRI progress reporter cancels an image pull if it sees no progress for 5 seconds. It tracks this through active HTTP requests. During remote fetches, the HTTP response reader is closed via a deferred call after `content.Copy` completes. Diagnosis: `content.Copy` handles both downloading the stream and committing the writer to the content store. Any delays during the database commit phase (e.g. from database locks, slow disk syncs, or concurrent pull deduplication blocks) keep the HTTP connection open. The progress reporter sees the request is still active (`activeReqs = 1`) but no new bytes are coming in, leading to a premature timeout cancellation. Reproduction: We reproduced this flakiness deterministically on a GCE VM under a simulated 2 Mbps ingress bandwidth limit using Linux traffic control ingress policing (`tc filter ... action police rate 2mbit`). Under this slowness, the download took longer than the progress timeout during the slow commit phase, triggering context cancellation and failing the `TestCRIImagePullTimeout/HoldingContentOpenWriterWithLocalPull` test. Solution: To fix this, we wrap the HTTP reader in a `closeOnEOFReader` or `closeOnEOFReadSeeker` before handing it to `content.Copy`. If the underlying connection reader implements `io.Seeker`, it is dynamically wrapped in `closeOnEOFReadSeeker` to forward `Seek` operations. This ensures that O(1) Range seeks are fully preserved during network resumes or retries. The wrappers automatically close the underlying network stream as soon as `Read()` returns `io.EOF` (when the download completes, before the database commit begins). This drops `activeReqs` to `0` early, freeing the socket and preventing progress timeouts during commits. A `sync.Once` ensures that subsequent deferred `Close()` calls do not double-decrement the reporter. How it was tested: Verified the fix on a GCE VM under a simulated 2 Mbps ingress bandwidth limit. Verified seeker safety via standalone logic audits and trace proofs. Assisted-by: Antigravity Signed-off-by: Samuel Karp <samuelkarp@google.com>
The TestCRIImagePullTimeout test case "NoDataTransferred" flaked under constrained networks because the test proxy mirror registry used a blocking ReadAtLeast call to forward bytes to containerd. This blocking wait (up to 4KB) meant the mirror registry server completely stopped forwarding data during network slowness, triggering containerd's aggressive 5-second progress timeout and canceling the pull before it could reach its 3MB circuit-breaker limit. This is resolved by changing the proxy's custom copy loop from io.ReadAtLeast(src, buf, len(buf)) to standard src.Read(buf). This streams network chunks to containerd immediately as they arrive, preventing false timeout cancellations while maintaining correct circuit-breaker byte tracking. Assisted-by: Antigravity Signed-off-by: Samuel Karp <samuelkarp@google.com>
Signed-off-by: Derek McGowan <derek@mcg.dev>
remotes: close fetch reader immediately on EOF
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
CI: update Fedora to 44
Add max size label for snapshots
…rpolation Use intermediate env variables for bash script runners in github workflows
Upload crash artifacts from go test -fuzz when failed
Add defer in event of mid-function failures in RunPodSandbox to avoid mount leaks
GHA runners occasionally experience I/O constraints during root-test test execution. While concurrent tests rapidly allocate loopback devices, background udev probing stalls. This quickly exhausts systemd-udevd's default worker pool ceiling (20 children max), stalling netlink uevent processing so device-mapper device nodes are never created for subsequent dm-verity test execution. Logging cgroups v2 pids.peak telemetry confirmed peak in-flight udev workers accumulate to 325 during test execution. Raising the children-max limit to 500 provides comfortable buffer room so udevd freely spawns worker processes without entering event lockup or causing test timeouts. Assisted-by: Antigravity Signed-off-by: Chris Henzie <chrishenzie@gmail.com>
Configure udevd children-max for root-test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )